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ABSTRACT 

In  many  situations  it  is  useful  to  have  a  low-dimensional  representation 
of  the  space  of  distributions.  In  this  report,  one,  two  and  three  dimensional 
representations  are  given  which  are  of  particular  relevance  to  the  study  of 
robust  estimation  of  location  based  on  rank  estimators.  The  distances  are 
defined  as  functions  of  the  asymptotic  relative  efficiency  of  the  most 
efficient  rank  estimator  for  one  distribution  when  used  on  data  from  another 
distribution.  Values  of  these  distance  functions  are  computed  for  a  large 
number  of  pairs  of  distributions  and  multidimensional  scaling  is  used  to  find 
the  low-dimensional  representations. 
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Significance  and  Explanation 


There  is  considerable  interest  now  as  to  how  one  should  estimate  the 
center  of  location  of  a  statistical  distribution.  Traditionally  the  sample 
mean  has  been  used,  often  in  conjunction  with  outlier  rejection  rules. 
However,  there  are  often  problems  with  this  procedure.  Recent  interest 
has  focused  on  "robust"  estimators.  This  report  provides  "maps"  of  an 
important  portion  of  the  space  of  statistical  distributions.  These 
maps  are  very  useful  to  those  studying  robust  estimators. 
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REPRESENTATIONS  OF  THE  SPACE  OF  DISTRIBUTIONS  IN 
ROBUST  ESTIMATION  OF  LOCATION 

David  L.  Hall  and  Brian  L.  Joiner 

1.  Introduction 

It  is  often  useful  to  have  a  measure  of  "closeness"  among  distributions, 
a  way  of  making  more  precise  such  notions  as:  the  normal  and  logistic 
distributions  are  quite  similar,  whereas  the  normal  and  Cauchy  are  quite 
different.  However,  in  an  important  sense,  the  similarity  between  distributions 
is  very  much  a  function  of  the  context  in  which  one  is  working.  In  some 
situations,  such  as  variance  estimation,  the  agreement  between  fourth  moments 
might  be  critical;  in  other  situations  the  relative  heights  of  the  densities 
at  the  medians  might  be  the  most  important  characteristic.  In  this  report 
we  develop  "maps"  of  the  space  of  distributions  based  on  a  measure  of 
distance  between  distributions  that  is  of  particular  relevance  in  the  problem 
of  robust  estimation  of  location,  especially  for  rank  estimators. 

The  approach  used  here  is  intuitively  appealing:  if  two  distributions 
are  such  that  the  best  estimator  for  one  works  quite  well  on  data  from  the 
other,  the  two  distributions  are  in  an  important  sense,  quite  close. 

V 

Research  in  this  area  apparently  begins  with  the  work  of  Hdjek  and  Siddk 
(1967).  They  proposed  using  as  a  distance  measure  a  simple  function  of  the 
asymptotic  relative  efficiency  (ARE)  of  the  corresponding  asymptotically 
most  powerful  rank  tests  (amprt).  Their  measure  is  (2(l-*/,ARE))^Z.  They  did 
not  however  pursue  the  idea  much  further  than  this  definition.  Takeuchi , 

Meisner  and  Wanderling  (1973),  hereafter  called  TMW,  presented  another  related 

Associate  Manager,  Statistics  and  Materials  Safeguards  Section,  Battelle 
Northwest  Laboratories,  Richland,  Washington. 

** 

Professor  and  Director  of  Statistical  Laboratory,  Department  of  Statistics, 
University  of  Wisconsin-Madison. 

Sponsored  in  part  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041 . 


measure,  A -ARE.  They  computed  distances  between  some  pairs  of  distributions 
and  gave  a  brief  discussion  of  some  of  the  Implications  of  their  distance 
measure  in  the  context  of  robust  estimation. 

These  distance  measures  are  not  quite  as  arbitrary  as  they  might  at 
first  seem.  Recall  (Gastwirth,  1966)  that  a  score  function  for  an  amprt  rank 
procedure  can  be  viewed  as  a  vector  through  the  origin  in  Hilbert  space 
and  that  the  square  of  the  cosine  of  the  angle  between  two  such  vectors  is 
the  ARE  of  either  one  applied  to  data  from  the  other.  Then  the  Htfjek 

V 

and  Siciak  distance  is  the  chord  length  of  this  angle  for  normalized  score 
functions,  and  the  TMW  distance  is  the  sine.  We  also  considered  two  other 
distances,  the  angle  itself  and  its  tangent  but  there  seemed  to  be  little 
practical  difference  among  the  four,  for  present  purposes. 

We  also  note  that  measures  of  distributional  similarity  based  on  ARE 
are  much  simpler  for  rank  procedures  than  for  parametric  procedures.  This 
results  from  the  reflexivity  of  the  ARE  for  rank  procedures;  that  is 
ARE  (amprt  for  F  on  data  from  G)  =  ARE  (amprt  for  G  on  data  from  F),  a 
property  not  possessed  by  the  analogous  M  and  L  procedures. 

In  Hall  and  Joiner  (1980b)  the  ARE's  of  rank  estimators  are  computed 
for  a  large  number  of  pairs  of  distributions.  Here  those  efficiencies  are 
converted  to  distances  using  the  TMrf  distance  (it  is  easy  to  prove 
this  is  a  true  metric)  and  multidimensional  scaling  (KDS)  is 
used  to  create  low  dimensional  representations. 

The  representations  depend  on  the  data  used  and  we  have  chosen  to  use  45 
distributions  which  are  "heavier  tailed"  than  the  normal.  Distributions  with 
light,  uniform-like  tails  were  excluded  because  our  early  MDS  results 
indicated  that,  while  the  relative  locations  of  the  45  heavier  tailed 
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distributions  could  be  fairly  well  approximated  in  a  low  dimensional 
space,  this  was  not  true  of  mixtures  of  heavy  tailed  and  light  tailed 
distributions.  We  chose  to  model  the  heavier  tailed  distributions  since 
in  our  experience  and  from  comments  in  the  robustness  literature  it  would 
appear  that  heavier  tails  are  more  of  a  problem  in  practical  work. 

The  representations 

Exhibits  1,  2  and  3  give  one,  two  and  three  dimensional  MDS  representations 
of  the  space  spanned  by  the  45  distributions.  All  three  representations  used 
a  regression  of  the  form  y  =  0x,  the  standard  MDS  measure  of  STRESS  and  started 
with  four  dimensions.  Varying  these  parameters  or  the  choice  of  distance 
measure  had  little  effect  on  the  first  two  dimensions  of  the  fit  but  did 
noticeably  effect  the  third  dimension.  Representations  obtained  with  non¬ 
metric  MDS  were  also  very  similar  to  the  metric  ones  used  here.  More  details 
on  these  alternative  solutions  are  given  in  Hall  (1980). 

One  useful  measure  of  the  adequacy  of  these  or  other  representations  is 
obtained  by  considering  the  fraction  of  total  spread  among  the  points  accounted 
for  by  the  fit.  In  the  representations  shown,  one  dimension  accounts  for 
92%  of  the  spread,  two  dimensions  for  99.3%,  three  dimensions  for  99.8%  and 
four  dimensions  for  99.9%.  Thus  for  these  45  distributions  a  fair  amount 
of  accuracy  is  gained  by  going  from  one  to  two  dimensions,  a  bit  more  by 
going  to  three  dimensions,  but  little  is  gained  by  going  to  higher  dimensions. 


A  =  -1.0 


•  Cauchy  (t,  \^=1) 


•  Double  Exponential 


•  Logistic-DE  v=0.55 

•  Student's  t  v=1.5 

•  A  =  -0.5 


•  CN  0=10;  10% 

•  A  =  -0.4 

•  Student's  t  v=2 

•  Mielke  r=0.2 

•  X  =  -o.3 

•  Logistic-DE  n=0.70 

•  Mielke  r=0.4 

•  A  =  -0.2 

•  Student's  t  v=3 

•  CN  o=5;  10% 

•  Logistic-DE  n=0.80 


• 

A  =  -0.1 

Logistic-DE  n=0.90 

• 

Mielke  r=0.8 

CN 

0=10;  5% 

• 

Student's  t  v=5 

• 

CN  o=3;  10% 

• 

Logistic-DE  n=0.95 

Logistic 

• 

Logistic-DE  g=0.99 

• 

CN  0=5;  5% 

• 

Student's  t  v=10 

Uniform 

Logistic 

• 

A  =  +0.05 

• 

CN  o=2;  10% 

CN 

0=3;  5% 

• 

0 

Mielke  r=1.5 

0 

CN  o=2;  5% 

0 

Student's  t  v=30 

CN 

0=5;  2% 

0 

0 

CN  o=10;  2% 

0 

CN  a=3;  2% 

CN 

0=3;  1% 

0 

CN  a=5;  1% 

• 

CN  o=2;  2% 

• 

CN  0=2;  1% 

CN 

0=10;  1% 

0 

Normal 

0 

A  =  +0.14 

o 

s_ 


(/> 

tO 

Of 


to 


§  . 
<L>  = 
co  jz 


c  cn 
o  c 

•r-  0J 
CO  »— 

c 

<U  r— 


*#—  to 
-O  4-> 


-4-J  CO 


O  CO 

•—  <v 
•M  t- 
«T3  t- 

4- >  O 

c  u 
oj 

co  o 
CL)  -*-> 

5- 
a. 
a) 
u 


<T3 

C 

o 


CO 

c 

0) 


■o 

QJ 

c 

o 


-4- 


LOGISTIC 


Two  dimensional  representation 


NORMAL 


Three  dimensional  representation 


Interpretations  of  Results 

Perhaps  the  most  striking  aspect  of  Exhibits  1,  2  and  3  is  the  much 
clearer  picture  afforded  by  the  2D  representation  over  the  more  conventional 
ID  view.  The  first  dimension  is  by  definition  the  most  important  and  seems 
to  be  quite  similar  to  what  is  ordinarily  thought  of  as  "tail  weight." 

Exhibit  2  shows,  however,  that  the  second  dimension  (which  does  not  seem  to 
have  any  ready  interpretation)  is  almost  as  important  as  the  first.  Having 
seen  this  second  dimension  we  find  ourselves  reluctant  tc  return  to  any  one 
dimensional  representation  of  this  space  or  use  any  function  that  attempts 
to  "order"  these  distributions.  The  additional  understanding  afforded  by 
the  third  dimension  does  not  seem  to  be  as  important  as  that  provided  by  the 
first  two,  thus  we  find  outselves  making  most  use  of  the  2D  figure,  referrring 
to  the  3D  version  only  to  double  check  perspectives  gained  from  the  2D 
representation. 

A  variety  of  features  are  interesting  in  the  2D  and  3D  representations. 
The  families  flow  among  relatively  smooth  curves  with  the  t  and  A  families 
in  close  proximity  throughout  their  range.  Good  agreement  was  known  between  t 
and  A  in  other  contexts  and  it  was  reassuring  to  see  it  manifest  here. 

The  logistic,  a  special  case  of  the  A  family,  is  very  close  to  a  Student's 
t  with  about  7  or  8  degrees  of  freedom  and  not  too  close  to  the  normal. 

This  relates  closely  to  the  observation  of  Mudholkar  and  Goerge  (1978)  that 
the  logistic  is  closer  to  a  t  with  9  degrees  of  freedom  than  it  is  to  the 
normal.  The  t  and  A  families  are  relatively  one  dimensional  and  fall  pretty 
much  along  the  "tail  weight"  axis. 

The  two  families  that  go  from  the  double  exponential  to  the  logistic 
fall  along  a  line  that  has  about  a  45°  angle  with  the  "tail  weight"  axis 
and  is  almost  perpendicular  to  the  contaminated  normal  range.  Thus  estimators 
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or  tests  based  on  these  two  families  cannot  be  expected  to  do  very  well  on 
data  from  t,  X  and  contaminated  normals.  The  L-DE  family  in  fact  corresponds 
to  the  family  of  adaptive  rank  estimators  proposed  by  Policello  and  Hettmansperger 
(1976).  Thus  it  is  clear  that  for  contaminated  normal  data  of  the  sort 
considered  here,  the  Wilcoxon  procedure  (corresponding  to  the  logistic 
distribution)  does  about  as  well  as  the  best  possible  adaptive  procedure 
based  on  the  Policello-Hettmansperger  family.  For  data  from  the  t  or  X  family, 
the  Wilcoxon  could  be  beaten  slightly  by  the  Policello-Hettmansperger  family, 
but  clearly  it  is  not  the  best  family  for  t  or  X  data. 

The  median,  which  corresponds  to  the  double  exponential,  is  clearly 
a  poor  choice  for  data  from  most  all  of  the  distributions  considered  here, 
and  is  especially  poor  for  data  from  contaminated  normals.  Thus  the  median 
is  resistant,  in  that  the  value  of  an  estimate  is  not  sensitive  to  a  few 
serious  outliers,  but  is  not  efficiency  robust  for  data  of  the  sort  considered 
here.  It's  efficiency  can  be  quite  poor,  as  low  as  65%,  corresponding  to  a 
wasteage  of  over  one-third  of  the  data. 

A  natural  question  that  arises  is  what  family  would  produce  estimators 
that  would  have  relatively  high  efficiency  over  the  range  of  distributions 
considered  here.  Clearly  none  of  the  families  we  have  considered  will  work. 

From  the  general  shape  of  the  contaminated  normal  and  t  families  we  are  led 
to  conjecture  that  a  family  of  contaminated  t  distributions  might  be  rich 
enough  to  cover  most  of  this  space,  except  for  data  near  the  very  peaked 
double  exponential.  One  might  find  that  the  amount  of  contamination 
could  be  fixed  at,  say,  5%  and  still  provide  a  contaminated  t  family  rich 
enough  to  cover  most  of  the  dpace.  If  so,  an  adaptive  procedure  based  on 
a  contaminated  t  family  with  varying  degrees  of  freedom  and  varying  scale 
for  the  contaminent,  might  suffice. 
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Selecting  a  family  of  estimators  that  will  be  rich  enough  to  be  useful 
is  thus  one  obvious  use  of  these  representations.  Another  important 
use  is  to  help  select  representative  distributions  to  use 
to  generate  the  data  for  Monte  Carlo  and  other  studies  of  the  properties  of 
robust  estimators. 
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